Trap-tandem: Data-driven Extraction of Temporal Features from Speech

نویسنده

  • Hynek Hermansky
چکیده

Conventional features in automatic recognition of speech describe instantaneous shape of a short-term spectrum of speech. The TRAP-TANDEM features describe likelihood of sub-word classes at a given time instant, derived from temporal trajectories of band-limited spectral densities in the vicinity of the given instant. The paper presents some rationale behind the data-driven TRAP-TANDEM approach, briefly describes the technique, points to relevant publications and summarizes results achieved so far.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TRAP-TANDEM: data-driven extraction of temporal features from speech - Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on

Conventional features in automatic recognition of speech describe instantaneous shape of a short-term spectrum of speech. The TRAP-TANDEM features describe likelihood of sub-word classes at a given time instant, derived from temporal trajectories of band-limited spectral densities in the vicinity of the given instant. The paper presents some rationale behind the data-driven TRAP-TANDEM approach...

متن کامل

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

TRAP based features for LVCSR of meting data

This paper describes using temporal patterns (TRAPs) feature extraction in large vocabulary continuous speech recognition (LVCSR) of meeting data. Frequency differentiation and local operators are applied to critical-band speech spectrum. Tests are performed with HMM recognizer on ICSI meetings database. We show that TRAP features in combination with standard ones lead to improvement of word-er...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Tandem representations of spectral envelope and modulation frequency features for ASR

We present a feature extraction technique for automatic speech recognition that uses Tandem representation of short-term spectral envelope and modulation frequency features. These features, derived from sub-band temporal envelopes of speech estimated using frequency domain linear prediction, are combined at the phoneme posterior level. Tandem representations derived from these phoneme posterior...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003